Collection-Oriented Scientific Workflows for Integrating and Analyzing Biological Data
نویسندگان
چکیده
Steps in scientific workflows often generate collections of results, causing the data flowing through workflows to become increasingly nested. Because conventional workflow components (or actors) typically operate on simple or application-specific data types, additional actors often are required to manage these nested data collections. As a result, conventional workflows become increasingly complex as data becomes more nested. This paper describes a new paradigm for developing scientific workflows that transparently manages nested data collections. Collection-oriented workflows have a number of advantages over conventional approaches including simpler workflow designs (e.g., requiring fewer actors and control-flow constructs) that are invariant under changes in data nesting. Our implementation within the Kepler scientific workflow system enables the explicit representation of collections and collection schemas, concurrent operation over collection contents via multi-level pipeline parallelism, and allows collection-aware actors to be composed readily from conventional actors.
منابع مشابه
Project Histories: Managing Data Provenance Across Collection-Oriented Scientific Workflow Runs
While a number of scientific workflow systems support data provenance, they primarily focus on collecting and querying provenance for single workflow runs. Scientific research projects, however, typically involve (1) many interrelated workflows (where data from one or more workflow runs are selected and used as input to subsequent runs) and (2) tasks between workflow runs that cannot be fully a...
متن کاملA Compiler Toolchain for Distributed Data Intensive Scientific Workflows
by Peter Bui With the growing amount of computational resources available to researchers today and the explosion of scientific data in modern research, it is imperative that scientists be able to construct data processing applications that harness these vast computing systems. To address this need, I propose applying concepts from traditional compilers, linkers, and profilers to the constructio...
متن کاملProvenance in collection-oriented scientific workflows
We describe a provenance model tailored to scientific workflows based on the CollectionOriented Modeling and Design paradigm. Our implementation within the Kepler scientific workflow system captures the dependencies of data and collection creation events on preexisting data and collections, and embeds these provenance records within the data stream. A provenance query engine operates on self-co...
متن کاملActor-Oriented Design of Scientific Workflows
Scientific workflows are becoming increasingly important as a unifying mechanism for interlinking scientific data management, analysis, simulation, and visualization tasks. Scientific workflow systems are problem-solving environments, supporting scientists in the creation and execution of scientific workflows. While current systems permit the creation of executable workflows, conceptual modelin...
متن کاملDecentralised Orchestration of Service-oriented Scientific Workflows
Service-oriented workflows in the scientific domain are commonly composed as Directed Acyclic Graphs (DAGs), formed from a collection of vertices and directed edges. When orchestrating service-oriented DAGs, intermediate data are typically routed through a single centralised engine, which results in unnecessary data transfer, increasing the execution time of a workflow and causing the engine to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006